Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CSV formatting for 'ginkgo outline' #1490

Merged
merged 4 commits into from
Dec 6, 2024
Merged

Conversation

belden
Copy link
Contributor

@belden belden commented Dec 5, 2024

Ensure CSV formatting properly encodes various edge-case values.

We lean on encoding/csv from the go stdlib to do the work for
us. In order to mix writing indentation with writing csv rows, we must
csvWriter.Flush() after every csvWriter.Write(row).

Adds a new test file, and updates existing test .csv files to expect
proper CSV encoding. The commit messages on each commit may be useful
to the reviewer.

This pull request resolves #1489

Describe,104,240,104,240,false,false,false,""
It,209,236,209,236,true,false,false,""
Describe,"104,240",104,240,false,false,false,
It,"209,236",209,236,true,false,false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I laughed to discover that ginkgo/outline/_testdata/position_test.go already has text in place which should have forced using a CSV writer. The test in question:

package example_test
import (
. "github.com/onsi/ginkgo/v2"
)
// Describe start=104, end=240
var _ = Describe("104,240", func() {
/*
* block comment
*
*/
// line comment
// It start=209, end=236
It("209,236", func() {
})
})

A test which uses a csv Reader to load each _testdata/*_test.go.csv would have caught this, but it's easy to miss out on writing meta-tests.

In order not to clutter this PR, I have not added such a test to the suite. If the reviewers like, I can add such a commit onto this branch; or provide a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a picture to hopefully illustrate the previous error in this file:

image

Counting the comma-separated fields in each line, you'll see:

Comment Row Fields
Header Name,Text,Start,End,Spec,Focused,Pending,Labels 8
Removed line Describe,104,240,104,240,false,false,false,"" 9
Removed line It,209,236,209,236,true,false,false,"" 9
Added line Describe,"104,240",104,240,false,false,false, 8
Added line It,"209,236",209,236,true,false,false, 8

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! this is great, thanks. yes please feel free to add that test to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

I wrote the test and saw it pass on the HEAD of my branch. I rebased it to pretend I'd written this meta-test as the first, rather than last, commit on this branch.

Then I just checked out each commit and ran

 belden@laptop github/onsi/ginkgo/ginkgo/outline $ go test .

to see how the test failed. With the meta-test as the first commit, it properly fails due to outline/_testdata/position_test.go.csv having the extra fields, as shewn in the screenshot above.

The test continues to fail in various ways for the subsequent commits; and when we finally get to the HEAD of this branch, the test passes as expected.


I broke the test up into two functions so as not to make the DescribeTable test body be too long. When I initially looked at that file, I immediately saw the super helpful comment about how to extend the test. I wanted to let the next person immediately see that same helpful comment.

This test fails on this commit, which shows that it properly detects
that

    outline/_testdata/position_test.go.csv

can not be correctly parsed due to having too many fields.

This test passes on the HEAD of this branch.
I generated the _test.go.{csv,json} files by using a version of ginkgo
built from the next commit.

If tests are run on this sha, they are expected to fail; the next
commit fixes them.
Ensure CSV formatting properly encodes various edge-case values.

We lean on `encoding/csv` from the go stdlib to do the work for
us. In order to mix writing indentation with writing csv rows, we must
csvWriter.Flush() after every csvWriter.Write(row).
The trailing `""` to wrap empty labels was a CSV error. This commit
simply updates the test files to be properly encoded. It is a result
of running the updated `ginkgo` binary from the previous commit across
all of the _testdata files.

----

There was prior art within `outline/outline.go` which tried to
properly enclose a comma-separated list of labels within double
quotes. This resulted in CSV output with trailing `""` characters for
tests that have no labels.

A trailing `""` is incorrect for CSV formatting.

There are two special characters within CSV formatting:

    "  0x22   DOUBLE QUOTE
    ,  0x2c   COMMA

The mechanism to embed the field-separator 0x2c within a string is to
enclose it within quotation marks. The mechanism to embed the
quotation mark within a string is to use two quotation marks. i.e.
the following string:

    // go code
    foo := "The dog looked at me and said, \"Bark, bark!\""

should be represented in CSV as

    Punchline,Laughs
    "The dog looked at me and said, ""Bark, bark!""",17

just as it would be represnted in JSON as

    {
        "punchline": "The dog looked at me and said, \"Bark, bark!\"",
        "laughs": 17
    }
@belden belden force-pushed the fix-csv-formatting branch from f72c2b9 to 407c805 Compare December 5, 2024 06:46
@onsi onsi merged commit 63d113a into onsi:master Dec 6, 2024
6 checks passed
@onsi
Copy link
Owner

onsi commented Dec 6, 2024

Thank you! I'll cut a new release soon.

@onsi
Copy link
Owner

onsi commented Dec 10, 2024

hey @belden I was going to cut a new release but, unfortunately, this commit:

3d0e8f1

checked in a few test binaries. This bloats the repo and makes subsequent clones much slower.

I think I'm going to need to rewrite git history to undo this and I think the simplest thing will be to revet the master branch to the v2.22.0 tag (the commit just before I pulled in your PR). I'll update the .gitignore to make sure this doesn't happen again. Do you have any thoughts/ideas for an alternative approach to remove the files from the git history?

@belden
Copy link
Contributor Author

belden commented Dec 10, 2024

Ah nerds! Sorry about that @onsi - I saw those files show up, and thought they looked weird. I'm sorry for the extra work for you here.

Here's an approach that works and doesn't tie you to getting a new PR, or otherwise needing to replay the work:


  1. Make a fresh clone of your repo

    cd /tmp/
    git clone https://github.com/onsi/ginkgo temp-ginkgo
  2. Hop into that repo and pretend we never merged my PR

    cd /tmp/temp-ginkgo
    git reset --hard v2.22.0
  3. Make a new branch to work on. We'll use origin/master as the branch point, since that still contains the work we're interested in.

    git checkout -b temp-branch origin/master
  4. Find and remove the *.test files. Note that the trailing + on the next line is part of the find command.

    find . -name '*.test' -exec rm {} +
  5. Commit the removal, with a special commit message. Note the sha here is the one you identified.

    git commit -m 'fixup! 3d0e8f1'
  6. Tell git rebase to cuddle removing the *.test files back onto the offending commit. We need to set some variables to ensure your editor doesn't needlessly have you edit commit messages.

    GIT_EDITOR=/bin/true EDITOR=/bin/true git rebase -i --autosquash --autostash 3d0e8f1^

Now check the temp-branch. Some useful checks here:

  1. Validate the tagged sha for v2.22.0 is still 9e2f337b10cf4973bbc9ae4aeac5a3a334c3571a

    git log --pretty=oneline --n 5 --decorate=full
  2. Validate *.test are not present on the temp-branch history:

    git diff --name-only v2.22.0... | grep '\.test'

If the history of temp-branch is to your satisfaction, then you can merge it into your local master:

git checkout master
git merge temp-branch

And then you're a force-push away from wiping out this embarrassing mistake from me

git push --force-with-lease

I tried a few different approaches for fixing this, and the above one happened to be the easiest thing that worked. git rebase --exec $'...' might be able to work, but it eluded me.
Likewise git filter-branch.

This is trivial to do with git-filter-repo but that tool operates across the entire history of the repo, and rewrites all the SHA1s. Invalidating all of your tags seemed like it would be a no-go.

The automatic fixup trick is one that I use a lot, though behind the veil of magit.el. And it's easy enough to check each step, so it's what I've presented here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV data are not properly escaped by ginkgo outline
2 participants